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We study the problem of separating the data produced by a given quantum measurement (on 
states from a memoryless source which is unknown except for its average state), described by a 
positive operator valued measure (POVM), into a "meaningful" (intrinsic) and a "not meaningful" 
(extrinsic) part. 

We are able to give an asymptotically tight separation of this form, with the "intrinsic" data 
quantfied by the Holevo mutual information of a certain state ensemble associated to the POVM 
and the source, in a model that can be viewed as the asymptotic version of the convex decomposition 
of POVMs into extremal ones. This result is applied to a similar separation therorem for quantum 
instruments and quantum operations, in their Kraus form. 

Finally we comment on links to related subjects: we stress the difference between data and 
information (in particular by pointing out that information typically is strictly less than data), 
derive the Holevo bound from our main result, and look at its classical case: we show that this 
includes the solution to the problem of extrinsic/intrinsic data separation with a known source, 
then compare with the well-known notion of sufficient statistics. The result on decomposition of 
quantum operations is used to exhibit a new aspect of the concept of entropy exchange of an open 
dynamics. 

An appendix collects several estimates for mixed state fidelity and trace norm distance, that seem 
to be new, in particular a construction of canonical purification of mixed states that turns out to 
be valuable to analyze their fidelity. 
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I. THE PROBLEM 

Consider a quantum system, represented by a Hilbert 
space TL (which we assume to be of dimension d < oo 
in the sequel), and a measurement on this system, de- 
scribed by a positive operator valued measure (POVM) 
a = (ai,...,a m ), dj S B(7i) such that cij > and 

Following (2^] and |2£| we shall be concerned with the 
question "How much information is obtained by a?" , be- 
ginning with a clarification what this question should 
mean at all. Imagine that a family of states (represented 
by density operators) pi on TL is given, let us say with a 
priori probabilities pi, such that the density operator of 
this source of states is p — "YlnPiPi, then the "informa- 
tion" in question could mean the information in j about 
i, and one way to quantify it would be given by Shan- 
non's mutual information |2J] I(i A j). Note that this is 
in general less than the amount of raw data, which is op- 
erationally quanified by the entropy of the distribution 
of the j: -ff(A), with A., = Tr(paj), due to Shannon's 
source coding theorem Q . 

This choice however is rather arbitrary: asking about 
the identity of the state from a list. Why not allowing 
a different list, or asking for some property of the state. 
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Also, mutual information is a measure of correct identifi- 
cation; but what if we need only "almost correct" identi- 
fication, as in quantum statistical detection theory fllPf? 

It seems hence that specifying the information in mea- 
surement results, or even only the amount, in an oper- 
ationally satisfying way, is problematic, and one reason 
might be the complementarity of quantum mechanics: 
qualitatively, accessing some observable property opti- 
mally entails rather poor performance for others. Nev- 
ertheless, it is quite obvious intuitively that in almost 
any POVM there is "quantum noise" , i.e. redundancy 
put into the j by the very quantum mechanical proba- 
bility rule, most simply due to nonorthogonality of the 
operators Oj , for example in an overcomplete system (see 
e.g. @). 

Our approach will thus be from the opposite end: in- 
stead of attempting the impossible, defining what "use- 
ful" means in any circumstances, we adopt a very simple 
criterion of use less ness: statistical independence form the 
measured states, because independent randomness can 
be generated from outside without accessing the quantum 
system. On the other hand we do not permit a distortion 
of the measurement itself, so that we are forced to con- 
sider a simulation of the original measurement by means 
of, first, a random choice v of a measurement a'"' from 
a list and, second, computation of a result from the out- 
come of this measurement and the random choice, such 
that the statistical distribution of these results is indis- 
tinguishable from the ones of the original measurements, 
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on any prepared state. 

Because we can absorb the computation of the results 
into the labelling of the a^, this means that we aim at 
finding such POVMs, whose indices are labelled by the 
same j as a and probabilities x v , such that 



a 3 = E 



x v a 



(") 



(1) 



(The operators must be the same because otherwise there 
would states that induce distinguishable outcome distri- 
butions. Below we will introduce an element of approxi- 
mation into this scheme). 

Why should we want to do such a decomposition, in- 
teresting though the structure exhibited (convex set of 
POVMs) might be mathematically? Observe that each 
a(") has its distribution of outcomes, with the probabil- 
ities A^ = Tr (yP a ^^ of j conditional on v. Shannon's 

source coding theorem [ p4| quantifies the amount of data 
in such a source as the (Shannon) entropy 



by compression (we note that in this paper all logs and 
exps are to basis 2). Hence, on average, one needs 

H{j\u) :=5>„ff (AM) 

bits to faithfully compress the data (j), given v as side- 
information. 

This motivates the study of the function 

6(ft,&) :=min : a = ^ x u a^ j , (2) 

which is the minimum data rate (in Shannon's sense) for 
exact reconstruction of the data. 



Example 1 Look at a qubit system, C 2 , with basis 
{|0), there let us consider the five "Chrysler" states 
(in analogy to the "Mercedes" trine states) 



nt 



wt 

sin— ) |1), for I (I. . . 



The collection a = (||et)(et|)t=o,...,4 * s a POVM, and 
we can determine its decompositions into extremal ones: 
these latter are given by putting weights on the |et)(e* | , 
and it is straightforward that for an extremal POVM at 
most 3 can be nonzero (as the "Chrysler" states form a 
pentagon on the Bloch sphere equator). In fact, every 
extremal must be of the form 

(a\e t )(e t \, f3\e t+2 )(et+2\, (3\e t +3)(e t +3\) , t = 0, . . . ,4, 



indices understood modulo 5. From here one can deter- 
mine the weights to be 



n = I - | COt y | - 0..V.2.N 



R 1 ( ■ 2n 



0.8944 



For simplicity now look at the maximally mixed state p = 
\t, for which it is unimportant which decomposition into 
these extremal POVMs is chosen, as all contributions v 
will give the same Shannon entropy: 



6(j>,a)=H(JW)=H 



a (3 (3 
2' 2' 2 



\ Hl 2' 2 
= H(l -/3,/3) +13 « 1.5447 

In contrast, the main theorem || below will achieve a rate 
of H{p) — 1, asymptotically. 

The computation of S(p, a) is an interesting problem in 
its own right (in particular the question if anything can 
be gained on S by considering multiple copies, i.e. the ad- 
ditivity problem), however we take a different approach, 
bearing in mind that the operational content of Shan- 
non's theorem involves block coding — i.e., a large num- 
ber I of independent copies of the simple system described 
above, and an arbitrarily small yet nonzero error proba- 
bility: 

Thus we are really decomposing the POVM 



,<8i 



where we have introduced te notation j 1 = ji . . . ji for a 
string of symbols, used henceforth. And the error intro- 
duced through block compression entails that instead of 
eq. (Q) we will only have 



(3) 



where the » sign is made precise to mean "average ap- 
proximation of outcome statistics" : assuming an ensem- 
ble {(JkiQk} with J2k1 k(Tk = P®" '■> there is the joint dis- 
tribution of input k and output j 1 when applying a®' 



l(k,j l ) = qkTr(o-kaji), (4) 

and likewise for A: 

T(k,j l )=q k Tr(a k A jl ). (5) 

Then we require that, independent of the particular en- 
semble, 

^Il7-r|| 1 = ^i| 7 (fc,i')-r(fc,i')|<e. (CP) 
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(It is not difficult to see that eq. ([j]) raised to the the I th 
tensor power, together with Shannon compression of the 
outcomes of a^ 1 ) ® • • • ® for the probably v\ . . . vi 
yields exactly that). Indeed we can, using the abbrevia- 
tion u> — p® 1 , rewrite eq. (0) as 



Tr uj- 1 ' 2 



-1/2 



observing that the Sk = uj^^-^qkO-k^ 1 / 2 form a POVM 
on Ti® 1 (this fact was observed before, and used in ||l6| to 
classify all ensembles with a given average state). Simi- 
larly 

r(fc,/) = Tr (SfcVSJAjiVw) , 

and we can rewrite and estimate the left hand side of 
(CP) as follows: 

db-rili = XE o l Tr ( Sk ^( a f ~ A>0V^)| 



so (CP) is in fact implied by 



< e. 



(CM) 



Notice that the condition can be phrased in a particularly 
nice way introducing the quantum operations 



(G) 
(7) 



Namely, for a purification 7r of p, (CM) is easily seen to 
be equivalent to 



1 



- j[(id® **)(7r® 1 ) — (id ® 0®') (vr®')||i < 



(8) 



The organization of the paper is as follows: In sec- 
tion [n] we will present our main theorem ^ and its proof, 
which is much more satisfying than results in previous 
work |2l], |2S| ], that can now be regarded as precursors: 
theyy are shown to easily follow from theorem in sec- 
tion III . Section |v] is concerned with the asymptotic op- 
timality of our main theorem, a strong converse result, 
theorem |^. After this, in section [v] we apply our result 
to a kind of asymptotic normal form of completely posi- 
tive trace preserving maps (operations as well as instru- 



ments) , and present an extensive discussion in section VI : 
we restate our observation from Q that one ought to 
distinguish obtained data from information, give a new, 
conceptually simple proof of the Holevo bound, remark 
on the classical case of the main theorem (which includes 



the problem of separating extrinsic and intrinsic data un- 
der a known source ensemble), comment on the related 
concept of sufficient statistics, and discuss the bearing 
of our results on the concept of entropy exchange of an 
open dynamics of a system. We close with a challenging 
open problem. An appendix features several not widely 
known facts about the mixed state fidelity, in particu- 
lar introducing canonical purifications of mixed states, a 
second appendix collects properties of typical sequences 
and typical subspaces, used in the main text. 



II. SEPARATING EXTRINSIC AND 
INTRINSIC DATA 

We want to represent (up to a small deviation as speci- 
fied by the (CM) condition) a.® 1 as a convex combination 
of POVMs AM, with positive weights x v , v — 1, . . . ,N, 
each being defined on the set [m] 1 and having a small 
number M of sequences on which it is supported (i.e. 
where A^} ^ 0): this is an even stronger requirement 
than the entropy condition we had considered in the in- 
troduction. Performing A amounts to choosing a v (with 
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FIG. 1: The source represents a number of possible states 
encountered by the POVM, but there is no way of knowing 
which is present (apart from the apriori distribution). The 
data produced by the measurement is then stored in a record. 
The rates of these processes are represented by the sizes of the 
different boxes and width of the data flow arrows: originally 
the rates of the source and of the measurement outcomes are 
both large. 

probability x v ), and performing A'"', which itself can 
generate at most M different outcomes: the i/-part of the 
produced data is obviously independent of the incoming 
signal, while the measurement outcome (conditional on 
the v chosen) contains the useful information. 
Our central result is: 

Theorem 2 There exist POVMs A M on [m] 1 , v 

\ Y . each supported on a set of cardinality at most 

M , where 



M = exp (lI(X;p)+0(Vl)) , 

N = exp (l (if (A) -I(X;p))+ 0(V~lf 

such that for A = jj A'"' condition ( CM) is satisfied. 
The characteristic constant in the exponent is 

I{\-p) = H(p)-Y. X i H ^)' 
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the entropy defect of the ensemble (Lebedev and Lev- 
itin fl^l ), or the quantum mutual information between a 
sender producing letter j with probability Xj and a re- 
ceiver getting the letter state pj (see ^3|). It is the 
difference between the von Neumann entropy H(p) = 
— Trplogp of the ensemble and its conditional entropy 
H(p\X) V A ; // ;/)/ , 

Observe that not only p can be recovered from this 
ensemble (as its average), but also the POVM a: 

aj = p-^Xjpjp- 1 / 2 . 

This construction is known as the "square root measure- 
ment" p3| , or "pretty good measurement" |)|. We shall 



DATA 
RECORD 




FIG. 2: A nice way of picturing the content of theorem g is 
in the form of an elaborate bottleneck between source and 
outcomes: it is supplied from outside with the extrinsic data 
v, and conditional on this and the incoming k produces the 
intrisic data j . Only the intrinsic data are correlated to the 
signal k, while the extrinsic data (though evidently an indis- 
pensable part of the whole data) is independent of it. To put 
it pointedly: while it is difficult and possibly ambiguous to 
speak of "useful data" , one can clearly identify data of no im- 
port in all respects: the unrelated randomness v. This is put 
into the focus by theorem ^, and our concept of usefulness 
is just the remainder after extracting as much uselessness as 
possible. 

give the proof of theorem^ in a minute, after a few prepa- 
rations. A central part of the argument is the following 
auxiliary result from M that we state separately: 

Lemma 3 (Ahlswede, Winter [^J, thm. A. 19) Let 

X±,..., Xm be independent identically distributed (i.i.d.) 
random variables with values in the algebra C(IC) of lin- 
ear operators on K, which are bounded between and 1. 
Assume that the average EX^ = a > si. Then for every 
< J) < 1/2 



Pr« 



j-J2x^[(l±n)a]\ 



< 2dim/Cexp [-M 



Q 

7] S 

2 In 2 



where [(1 ± 77)17] = [(1 — 77)0"; (1 + rj)a\ is an interval in 
the operator order: [A; B) = {X G B{JC) ■ A < X < B}. 



We shall use the concepts of typical and conditionally 
typical subspaces in the form of |27fl , which we collect in 
appendix [b|. 

Proof of theorem [|. Define the following operators: for 
f G T{ s let 



(9) 
(10) 



We choose 6 = mJ — , so that 



S:=A®<(T^)>l-e, 
Tr^ > 1-e, 



which is true by Chebyshev's inequality and eqs. ( ]B2| ) 
and (BS), specifying e later. 

Notice that in this way Tr w' > 1 — 2e for 



By eq. <\BH) we have 



n l PtS uR l PtS > aR l PtS , 

with a = exp(-lH(p) - 0{\fl)). Define now II to be the 
projector onto the subspace spanned by the eigenvectors 
of lo' with eigenvalue > ea. By construction we find 
TrQ, > 1 - 3e for Q = S^ILj'U. ' 

Now let = n^. ; II and define i.i.d. random variables 

4 V) G^,«, v = l,...,N,(i=l,...,Mby 



Pv{4 1/) =j l } = f=--L' jl . 

That is, we consider N independent sets of M indepen- 
dent choices each, from Ti s . Observe that = E£ („>, 

the expected value of the random operators £ . 

We shall show that with high probability the following 
conditions hold: 



1 M 



for all v, and 



11=1 



^£f#>€[(l±e)Z/]. 



NM 



(II) 



This is most easily seen with the help of lemma gf ac- 
cording to it 



Pr{-I„} < 2TrIIexp —M 



2/3 In 2/ ' 



Pr{-H} < 2|7£« |exp (-NM 



£ 7 
21n2, 
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with 



7 = min{Ajj : j l € T{ s } > exp (-IH(X) - KmSVl 



compare eq. (B8). Choosing M and N according to the 
theorem's statement will force the sum of these probabil- 
ities to be less than 1, i.e. with positive probability all 
the events (I„) and (II) happen. 

Let us assume we fix now values for the jf?' such that 
all equations (I„) and (II) are satisfied. Then we may 
define operators 

/ 



A 



0) 



S 



-1/2 



1 



1 

M 



E e 4 



-1/2 



V 



l 



-1/2 



We check that for each v these form a sub-POVM (i.e., a 
collection of positive operators with sum upper bounded 
by 1): using (I„) and the definitions of f2 and ui' we find 




1 M 
U E 



Finally, we check that condition (CM) holds: it is suffi- 
cient to do this for the sub-POVM constructed, because 
then we can distribute the remaining operator weight to 
fill up to 1 arbitrarily. 

We calculate directly from the definitions: 



i f>ji 



<- 2 d-s) 



S\{vy: =3 l }\ 
(l + e)NM 

E L j l n\\Pi l - &h 



< 



E 



1 + 



< e 



E ^Qii^-^iii + ^ii^-^iii) 



(11) 

By the definition of , using eq. ( |l0|) and lemma || below, 
we can bound the first of the two terms in brackets by 
e + v2e- It remains to estimate the second: consider 



n' 



E 



and recall that £ji 
struction we have 



n$',n, hence n = lift' II. By con- 



E ty*#>i 



3e, 



thus, using lemma ^| with each of the £jj and employing 
concavity of the square root function, we end up with 



'6e, 



E -SHU < 

which allows us to estimate ( |lT| ) by 2e + V2e + y/6e. □ 

Here is the lemma that we needed in the proof: it says 
that a POVM element that is likely to respond to a state 
acts "gently" on it in the sense of little disturbance. 



Lemma 4 (Lemma V.9 of |27|) For a state p and 
and an operator < X < 1, if Tr (pX) > 1 — A, then 

p - \fXp\fX < V8A. 
l 

The same is true if p is only a subnormalized density 
operator. □ 

III. PREVIOUS APPROACHES 



The question addressed in the present paper of quanti- 
fying the "amount of information obtained by a quantum 
measurement" has been posed before, in the works 21 
and p8[ , with mathematical modellings different from 
ours, though there is an evolution leading from the first 
to the present: 

In J2 lfj the POVM a was assumed to maximize a certain 
Bayesian gain (there called "fidelity" ) 



F(a) 



J^PiTr (pia,j)Fij, 



to achieve the optimal (i.e. maximal) value -F pt- On 
blocks of length I the gain (or fidelity) function was ex- 
tended by defining F^iji — j Ylk=i Fi k j k ■ This definition 
has the easily checked property that the gain on blocks 
of length I, 



F(a® l ) = Yv i iTr(p il a j i)F i i j i., 



j! 41 



(12) 



equals the single letter expression F(&). 

Note that in this way the maximum Bayesian gain is 
still F opt (which can be seen from eq. jl3| ) below). Then 
the following theorem was shown: 



Theorem 5 (Massar, Popescu [21]) For e > and I 

large enough there exists a POVM A with fidelity F(A) > 



F, 



opt 



e and 



M < exp(l(H(p) + e)) 
many outcomes among the j 1 . 
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This result was interpreted as saying that about any 
property of the ensemble states, as encoded in the 
Bayesian gain matrix , one can learn at most one bit 
per qubit. 

In [E8| this was extended and clarified as follows: ob- 
serve that for any POVM A = (Aji ) m =i,....m one has 



Given e > 0, there exists a POVM A = (Aji ) f _ l= i 



1 - 

F(A) = E Pi « E Tr (ft H ) y E f «m 

f |U fc=l 

= T EEEft Tr (ft( A l fc )^ 



(13) 



fc— 1 2 j 

where (with [I] = {!,...,/}) 



(A|fc),=Tr^ X! A 4] 



r ^ IviFl E ^ v? 




For each fc, the collection ((Al/s)^)^!,...^ obviously is 
a POVM on Ti. We may assume (as we shall do in the 
sequel) that the \Fij\ are bounded by 1: then the fidelity 
condition of theorem ^, reading 

\F(A)-F( a )\ <e, (CO) 

is implied by 

Vfc l^ Tr (MA|fc)j) -piTr(p iaj )\ < e. (CI) 

ij 

This is itself implied by 

VfcVi ^ |Tr (ft(A|fc)j) - Tr ( Pi aj)\ < e, (C2) 

3 

which in turn follows from 

Vfc ^IKAI^-ajH <e. (C3) 

3 

It was then proved 



Theorem 6 (Winter, Massar |28|) For the state p 
and the POVM a define a canonical ensemble {pj,Xj}, 
with states 



Pi 



Tr (pa j) 
and probabilities Xj = Tr (paj) 



y/pajy/P 



with 



,M 



M < exp III H{p) - V XjH(pj) + CVl 



j 



(where C is a constant depending only on e, d and m), 
and such that ( C3) is satisfied. □ 

This theorem is in an asymptotic sense best possible 
(such an optimality was missing in pi) ): 



Theorem 7 (Winter, Massar |28|) Let < e < 

(A /2) 2 , with A = minj Xj. Then for any POVM 
A = (Aji ) M= i i ....A/ such that (C3) holds, one has 



M > exp ( I ( H(p) - ^jH(Pj) + || log ^ 



□ 




FIG. 3: In gj and Q the original POVM is replaced by an 
"equivalent" one (as made precise in theorems |^ and ^|) with 
much fewer outcomes. So, POVM and data record need much 
less rate of processing, and storage, respectively. Of course, 
compared to theorem H we loose many potential measurement 
results in constructing the new POVM. 

Here we want to show that the theorems || and || may 
be obtained as corollaries of theorem |2|. 
Proof of theorem Choose x v and A^ according 

to theorem |[ such that condition (CM) is satisfied for 
A = ^ JV x v Ay'\ with some e > (which implies that 
also (CP) is satisfied with the same e). Then, assuming 
without loss of generality that \F{j\ < 1, we get immedi- 
ately out of eq. (12) that 



\F(A)-F(*> 



< e. 



Since we assume that a maximizes F we conclude, using 
linearity of F in the POVM: 



Fopt-e = F(a®')-e 

<F(A) = J2^F(A^). 

This finally means that for at least one v 
F(A^)>F opt -e, 
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which is what we wanted to prove: recall that has 
M < exp(//(A; p) + 0(V~l)) < exp(lH(p) + 0{V~l)) many 
outcomes. □ 

Note that the latter estimate is met with equality if 
and only if a is maximally refined (i.e., consists of rank- 
1 operators only), so regardless of a, H(p) is the rate of 
intrinsic data of any probing of the ensemble states. 

Note further that our derivation does not depend on 
the particular structure of the block-fidelity: obviously 
we can as well conclude for any ensemble {cr/., q k } with 
average to and any fidelity matrix F k ji that 

|F(A) - F(a®')| < X^fclTi-KA,-,) - Tr (a k aji)\F kj i 

kj l 

<e\\F\\, 

with ||F|| := maxfcji |F feil |. If now ||F|| < 0(F(a®')) for 
I — > oo then we get (for sufficiently large I) 

F(A) > (l-e)F(a®'). 

Of course, as explained in the introduction, theorem |^ 
is really a corollary of theorem ^|. So, we continue to 
prove the latter: 

Proof of theorem |^. Assume that a collection of POVMs 
AM, v — 1,...,N like in theorem || is chosen, with 
probabilities x v , such that A' = VJ x v A^ v ' satisfies 
(CM). Define i.i.d. random variables 7i,...,Tq, each 
with Pr{T q = v} = x v . We want to study the random 
POVMs A^- T i\ and especially their mean 

1 Q 

A = i YaFo\ 

8=1 

Observe that EA = EA' T «' = A'. 

Recall the definition of marginal POVMs. Obviously, 
by linearity of this definition, we have 



and 



9=1 



E(A|fc) = E(A (T ^\k) = (A'jfc). 



From condition (CM) and the monotonicity of the trace 
norm under partial trace we get now, for every fc, 



< e. 



(15) 



Denoting the smallest nonzero eigenvalue of any of the 
yfpcijyfp by u, and choosing e small enough, this assures 
that yfp{A' \k) j yfp restricted to the support of ^fpaj^fp 
is lower bounded by u/2. Then we can apply lemma ^ 
and obtain 

Pr{v£(A|*) 3 -Vpg [(l±e)Vp(A'|fc),Vp] onsupp^} 

/ e 2 u 
< 2dexp (-Q— 



Thus we can estimate the sum of these probabilities over 
all k = 1 , . . . , I and j = 1 , . . . , m to less than 1 if 

4 In 2 

Q > 1 + — =— log(2c#m). 

This implies that there exist actual values of the T q such 
that for all k 

supp pj y /pa js /p\\ l <2e, (16) 



where we observed that the x fp{A'\k)^fp all have trace 
at most 1, and have used eq. (15J) . Hence we get (with 
K kj = Tr (p(A\k)j) and AyPjy = ^/p{A\k) jy /p) 



£> fc ,Tr (P w Upp fe J >l-2e, 
and using lemma ^, this gives 

]T-||Vp(A|fc),vp| Vp(A|fc) J -Vp|| 1 <2Vi. 

Now @ and (0) yield 

Denoting the minimal eigenvalue of p by r (which we 
assumed to be positive) this readily implies 



and we are done, since A has only MQ many possible 
outcomes. □ 



IV. STRONG CONVERSE 

In this section we prove the asymptotic optimality of 
the separation of the measurement from theorem 0. To 
be precise, it is 

Theorem 8 Whenever there are POVMs A^ on [m] 1 , 
v = X,...,N, each supported on at most M elements, 
and probability weights x v > 0, such that A — x v A^ u > 
satisfies condition (CM), for some e < 1, then 



M > exp (lI(X; p) - 0(V~l) 
MN > exp (lH{\) - O(Vl)) , 



where the constants depend only on e. 

Proof. Let us begin with the second inequality: by con- 
struction the set TZ C [m] of possible outcomes of A has 
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cardinality at most MN. Denoting by A the distribution 
of outcomes according to A, i.e. 

A jt = Tr {p m A ,) , 

from (CM) we get immediately 



A|U < e, (18) 

which in turn implies 

X®\ll)> A(ft)-e=l-e. 

By a well known trick |]30|] the lower bound now follows: 
we consider TV = 1Z D T{ s , with <5 : 



4—^, whence we 

have, using Chebyshev's inequality 

1 - e 



X® l {K') > 
Using the fact (compare eq. (|B^)) 

V?' G 7a,« A j ! < ex P (- lH ( x ) + KmSVl 
we conclude 

MN > > \Tl'\ > CX P ( ZiJ ( A ) _ KmSVPj . 

Now for the first inequality: introduce the ensembles 
{Pf with 



r r v j 

all of which have average w. Then we define the (subnor- 
malized) density operators 



O) 



Pf = Ui tS (j l ) Pjl Ili tS (j l ), 



for j l € Ti g, with 5 = J^f . Then by Chebyshev 
inequality and eq. (fB2) 



while from (CM) we get 



< 



1 -e 



j 1 



These immediately imply 

E^ E A^lVpy >l-e', 

f .-if 7-2 



< ^ =: e'. 



(19) 
(20) 



so there exists at least one v such that 

A^TrP^ >l-e'. 

Now consider the (subnormalized) density operators 



e 



r v j ! U WV j' 



which evidently satisfy 

0:= y A ( .r ) e ( .r ) <VA ( r ) p ( r ) =. 



Denoting with II the projection onto the support of 9 and 
inserting Tr = Tr pjj , we arrive at 



Tr (wn) > 1 - e', 
from where we conclude 

rankll = Trll > exp (lH(p) - O 



This follows by a standard reasoning (which we take 
from [^7)): for F = U l ^nil^ s , choosing 5 large enough, 
we get 



1-e' 



By eq. ( |B8| ) the inequality follows. 

On the other hand each of the 0^., has rank at most 



Tr (n^n^n)=Tr(uF)> 



exp 



(lH{p\P)+0(V~l) 



and we deduce our claim. □ 



We may relax a bit the condition of the theorem 
regarding the parameter M: if we allow the different 
POVMs A^) to have different numbers M„ of possible 
outcomes, then we can prove the slightly stronger esti- 
mate 

M := X » M » ^ ex P ( lI ( X > P) - °(V^) 

V 

(while the second inequality obviously holds for ^ v M v ). 
To see this go back to eq. (^) and observe that by a 
Markov inequality argument 

Pr B : E A ? Tr P? > 1 - V?} > 1 - V7, 
whence the claim directly follows. 

Remark 9 While in the above proof we assumed the 
property (CM) for e < 1, we conjecture that (CP) for 
all sources with average to, with e < 1, is sufficient to 
arrive at its conclusion. 

Let us inspect this possibility alorig the lines of the 
proof: crucial were the estimates (jT^j and (fH|), the 
former being an immediate consequence of (CP), so we 
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would have to show this only for the latter. However, this 
demonstration has escaped us so far. 

Finally, a comment on why this converse is strong: op- 
timality of theorem^ is proved already by our observation 
in the previous section that it implies theorem and the 
lower bound of theorem However, closer inspection of 
this lower bound reveals that it coincides with the upper 
bound only in the limit e — > 0. For positive e it leaves 
room for a tradeoff between compression and error ( not 
untypical for the type of error concept we had used). This 
is known in information theory as a weak converse |?^/. 
The strong converse in contrast shows optimality of the 
upper bound in the asymptotic limit I — > oo, with any e 
bounded away from 1 . 



V. ASYMPTOTIC DECOMPOSITION OF 
INSTRUMENTS AND OPERATIONS 

An interesting generalization of our main theorem 
arises from the point of view that POVMs are just a 
special case of general open dynamics: the most general 
form of evolution is a completely positive, trace preserv- 
ing linear map ip* from states on Ti to states on IC. Such 
a map can (non-uniquely) be represented in the Kraus 
form 



(21) 



where Vj-.H->K are C-linear and £\ VfVj = 1. The 
representation can be made unique by considering it as 
a partial measurement, and including the outcome j: ex- 
tend the output system to K ® J , and modify the map 

<y9* tO 



P* 



0=1 



V^V*®\ 3 ){ 3 \. 



(Technically this will amount to a change of the Kraus op- 
erators, too, but we will not need the details here). This 
is the notion of an instrument (Davies and Lewis ||). 
One can see that it is representable in Kraus form, too, 
so we will in the sequel always look at a particular Kraus 
representation. 

In analogy to the question about POVMs of this work 
we would like to approximate pf l by the average of some 
v = 1, . . . , N, each of which should have a Kraus 
representation with a small number of contributing op- 
erators. As is well known this number is the dimension 
of the ancillary system (environment) sufficient to emu- 
late the effect of the operation by a unitary interaction 
and subsequent partial trace. Its logarithm is an upper 
bound on the "information leakage" from the system to 
the environment. 

Note that (apart from looking at approximation) we 
are considering here the problem of convex decomposi- 
tion of completely positive maps, like we did before for 



POVMs. Of course, every completely positive map has 
a decomposition into extremal such ones, with possibly 
fewer terms in the Kraus representation. For this one 
can employ a theorem of Choi pL saying that p* from 
eq. (21) is extremal if and only if the family of operators 
V*Vk is linearly independent (in particular, then m < d). 

We show now how to solve this problem as a conse- 
quence of theorem ^, with an additional reasoning mainly 
directed to quantum state fidelities: 

Formally, we are looking for a family of maps 



: B{H m ) — > B{JC m ), 

M 

,t=i 



(22) 



and probabilities x u such that for $„ 
any ensemble {<7fc, qu} with average u> 
condition holds: 



= E v xM v) and 
p®' the following 



5>a 

k 



(<7 fc )-$*(a fc )|L <e. 



(CO) 



In fact, there is an appealing way to state them all to- 
gether, and strengthen the content at the same time: for 
a purification ir of p on an extended system Ji ® Ji! we 
ask for 



1 



((^®id)® z (7r®')-($*®iO(7r® z ) < e . (CO*) 



Indeed, this implies (CO): just observe that by choosing 
a POVM (Tfc) on 7^'®' one can "induce" any ensemble 
{&ki Qk] on Tt® 1 for u>, in the following sense: 

q k a k =Tr w ,« ; (tt®'(1 ® T k )) . 

How to do this is explained in detail in (l6) (or see ap- 
pendix A below). Note that this generalizes the implica- 
tion of (CP) from (CM), discussed earlier, when we view 
the POVMs as the quantum operations eqs. (g) and (0). 

Conversely, assuming (CO) for all ensembles for uj does 
unfortunately not imply (CO*) with a comparable er- 
ror parameter. (Examples are not hard to construct for 
which (CO) holds with a small e while the bound in 
(CO*) is close to 1.) 

With p* there is associated the POVM 

--j,...,m), 

and with this goes the ensemble {f)j, Xj}, as before. 
Theorem 10 With the above notation and e > there 



V*Vj 



exist quantum operations in the form of eq. j\2<\), with 
M < exp (lI(X; p) + 0(V~l)) , 
N < exp (l(H(X) - /(A; p)) + O 

and such that = J2 V satisfies (CO*). 

These bounds are asymptotically best possible if is 
an instrument. 
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Proof. Let and x v be the POVMs and probabilities 
constructed in theorem || from a® ( and u — p® 1 , and let 
A = x v AS u >. We use the notation from the proof of 



this theorem and from section 



IV 



y/pa jy /p = XjPj, 



v r v 3 3 



»8M 



A, =5>„AM. 



Note that by the proof of theorem || the either are 

or equal to Py := 

Introduce the unitaries Uj by the polar decomposition 



Vj VP = Uj ^[Wjv^fp = Uj yfijfa, (23) 



and let Uji = Uj 1 < 



) Uj, . Now define by letting 



W^V^ = UjJA^\ 



(24) 



and observe that for fixed v only M of them are nonzero, 
and that for fixed j l these are all multiples of each other. 
Hence these operators define a quantum operation ^>i^ 
according to the theorem, and = J2 V x v&* ■ 

With these definitions we check that (CO*) is satisfied: 
using n® 1 = (Vw® l® J )|iX7|(Va;<8> 1®') (see lemma [lj 
in the appendix) and eqs. (E31) and (04) we calculate 



(ifif ® id® l )(7T®') - (*, ® id®')(7T®') 



- > HIT, ®l® i ) 7 r®'(T^®:iL® ! 



< ||A® ( - A||i 



(25) 



The last line here is estimated as follows: the first term 
is bounded by 2e (see the proof of theorem ||), and for 
the other we use lemma [Uf observe that for each j l 
the two terms inside the trace norm are the canonical 



purifications of Pji and Pj 



respectively. Thus we get 



Ipf ® 1 $ 



< 2V2 



Pi' 



and using concavity of the root function and the estimate 
of eq. (O) we can upper bound the last line of eq. ( |25| ) 
by 0(e*7*). 

If (/j* is an instrument any approximate convex decom- 
position of (ff l implies a similar decomposition for the 
POVM a®'. Hence theorem || gives the optimality of the 
bounds for M and N. □ 

Interestingly, the bounds of theorem |o| depend on the 
Kraus representation ( pl| ) of the map </?*: all other such 
representations are related by unitary transforms, i.e. 



,(a) = 5»K 



/* 



if and only if 



with a unitary matrix (tijj)jj of complex numbers. (This 
is essentially a consequence of the uniqueness up to uni- 
taries of the Stinespring dilation [p5| of ip, which implies 
the Kraus representation. This fact is also discussed in 
detail in §|). 

This motivates the introduction of 



min I(X;p), 

Kraus repr. of <p* 



(26) 



i.e. the minimum rate of the parameter M in decompo- 
sitions of Lp* according to theorem [Tol 

Note that, according to j22|, the minimum of H(X) 
over all Kraus representations is exactly S e , the entropy 
exchange of the map tp* (with respect to p). For a discus- 
sion see subsection VI E below, and the forthcoming [E9| . 



VI. DISCUSSION 

We have introduced a separation into extrinsic and in- 
trinsic data of a quantum measurement. It was shown to 
have definite minimal rates for either of these, and that it 
encompasses all previously known results on "meaning- 
ful" data in quantum measurements. A particular advan- 
tage of theorem || before theorems || and ^ is that it not 
even requires a new POVM (which might be experimen- 
tally difficult to realize). Instead, it can be understood 
as a mere re-interpretation of the data delivered by a®' : 
in fact, by our construction in the proof of theorem for 
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all v and j l either A^' is or very close to a multiple of 
dji, in the sense of (CM). Hence the random variable N, 
defined as a function of j 1 : 



Pr{N = v\j 1 } 



■ C i / 
A 



Tr 



Tr 



Tr (uidjt ) 



(27) 



(up to a scaling factor, close to 1 for typical j 1 ), is almost 
independent from the source ensemble {o~k,qk} in (CP). 
More precisely, 



~ \q k x^T (a k A^) - q k Tv (o-fcOj-i) Pr{N = v\j 1 } 

kvj 1 



< e, 



and in fact, we even have 

\ I^Vw^Vw - Pr{N = i/l/j-VwttyVoJ 



< e. 



This means that one can reproduce the statistics of the 
whole diagram in figure || from the outcomes of a®' , by 
inventing the v distributed according to eq. (^7|). This 
gives a new view on the extrinsic/intrinsic separation: 
rather than replacing the original POVM by a fancy con- 
struction, one can from the original data y compute the 
extrinsic data v, and conditional on that the intrinsic 
part. Then one can sucessfully pretend that this separa- 
tion was delivered by the mixture of the POVMs A*-"). 



A. Data vs. Information 

One (as it turns out, rather careless) interpretation of 
our result could be that the "useful" information pro- 
duced by the POVM a amounts to J(A; p). This in itself 
is not yet precise, so lets fix "information" to mean "com- 

for 



municable information" in the sense of Shannon 1 24 



any source {<7j, p{\ with average ^ /i^er,; = p the source 
and measurement outcome are random variables X and 
Y with a joint distribution 

Pr{X = i,Y = j} = /UiTr (0*0,-), 

and the mutual information of these is 

I(X AY) = H(X) + H(Y) - H{XY). 

We repeat here the discussion of |28| regarding the rela- 
tion between this quantity and I(X; p): 

Observe first that the joint distribution of X and Y 
can be rewritten as 



Yr{X = i,Y = j} 



p 1 l 2 p i a i p 1/2 y/pa jv fp 



Tr 

XjTt(pjS, 



where the Si — p~ x l 2 piOip^ 1 ! 2 form a POVM (com- 
pare |l(| where this correspondence between POVMs and 
ensembles was used to classify the latter with given den- 
sity matrix). But here the Holevo bound |12fl applies, 
with the ensemble {pj, Xj}, and thus we have proved: 



Theorem 11 Let {o~i, pi} be any ensemble whose aver- 
age state PiO~i equals p. Define random variables X, Y 
with joint distribution 

Pr{X = i, Y = j} = ^ 4 Tr (o^a,) 

(this is the probability for cti to occur and that j is ob- 
served on this state). Then 



I(XAY)<I(X;p). 



□ 



Note that in general maximization over the ensemble 
{(Ji,pi\ (yielding the accessible information 

J p (a) = I acc (X;p), 

because in the above proof it corresponds to an informa- 
tion maximization over the POVM Si) does not achieve 
the upper bound: see Q, where it is shown that it does 
if and only if all the pj commute. 
Furthermore, by a result from Jll| 

J p ®i(a® l ) = U p (a), 

hence the gap remains even asymptotically! For further 
discussion of this point we refer the reader to |2S[] , sec- 
tion VII C. We record here only the consequence that one 
ought to distinguish between data (collected by measure- 
ment) and information (about a property of the states): 
the latter is never larger than the former, and typically 
in quantum situations it is strictly less. However, this 
seems nothing to worry about: after all, this is an ob- 
servation quite familiar from our experience, though it is 
worth stressing that in the present context it is a purely 
quantum phenomenon. 

Peter Shor has remarked the notable fact that in the 
presence of entanglement, however, this distinction dis- 
appears: the entanglement-assisted capacity || for the 
quantum-classical channel that is represented by our 
POVM, i.e. from eq. (||), with the average of the sent 
symbols required to be p (this means that in the formula 
for the entanglement-assisted capacity one has to put a 
purification of p) coincides with our /(A; p)\ In fact, our 
result can be understood as a weak version of the con- 
jectured "Quantum Reverse Shannon Theorem" ||, for 
quantum-classical channels. 

To end this part of the discussion note that the bound 
of theorem ^] in the case of a maximally refined measure- 
ment is simply the von Neumann entropy H(p) of the 
source, and this regardless of the nature of the POVM 
and of the source. In this sense, there is "democracy 
among measurements" , at least the maximally refined 
ones. 

It is thus appealing to view our result as a dual to the 
creation of a density operator by mixing pure states: it 
is well known that in any representation p = ^iPi&i, 
with pure states ct^, H(p) > H(p), with equality iff the 
a, are mutually orthogonal eigenstates of p: hence, H (p) 
is the minimum entropy needed to generate p. In the 
present work we identify H{p) as the maximum entropy 
of measurement data correlated to p. 
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B. Holevo bound 

Here we show how to turn around the previous argu- 
ment to actually prove the Holevo information bound. 
The statement is as follows: 



Theorem 12 (Holevo |l2j) Let {pj,\j} 



be an 



ensemble of states with average p — ^2jXjPj, and 
(Si)i=i t ... 7 n a POVM. Define the joint distribution of ran- 
dom variables Y , X to be 

Pr{y = j,X = i} = XjTr (pjSi). (28) 

Then the inequality 

I(Y A X) < /(A; p) = H(p) - V XjHfo) 



/ j ~3 J 

3 



holds. 



Proof. To begin with, observe that eq. (|2^) may be rewrit- 
ten as 

Pr{Y = j,X = i} = piTr (cnaj), 

with Oj = p~ x l 2 \jPjp~ x l 2 and the ensemble {o~i,pi}, 
where PiO~i — ^fpSi^fp. Now consider i.i.d. realizations 
Xi, Yi, , . . ,Xi, Yi of the pair X, Y. We shall apply theo- 
rem H to a®' and p® 1 , with parameter < e < 1. Hence, 
for A = x v AW and the ensemble {<Jji,Pit} the con- 
dition (CP) holds. Let us define random variables £,v 
by 

Pr{v=j l ,S = i l }=ptTr(* i iA jl ). 

Then we may calculate (with /(e) := e(logm + 21ogn)) 

II(Y A X) =I(X l AY 1 ) 

<I(ZAv)+lf(e)+4 

<I{^hvn)+lf(e)+A 

= I(£Av) + I(tAp\v)+lf(e) + 4 

< + logM + Z/(e)+4 

< H{\;p) + 0(Vl)+lf(e) + 4. 

Only classical entropy relations have been used: line 2 is 
by lemma [l3] stated below, line 3 is by data processing, 
as v is a function of v and p, line 4 is a standard identity, 
and line 5 by independence of v and £ and the standard 
inequality /(£ A p\v) < H{p). 
Now divide by I and let I — > oo: 

I(Y A X) < I(X; p) + e(logm + 21ogn). 

As e > was arbitary, the theorem follows. □ 

Lemma 13 (Fano J7|]) Let P and Q be probability dis- 
tributions on a set with finite cardinality a, such that 
ill-P-Qlli < A. Then 

\H{P)-H{Q)\ < A log a + 2H(X, 1 — A) . 



The reader may want to compare this proof to our 
earlier one in psfl : despite similarities they are concep- 
tually completely different! In fact, there we introduced 
the Holevo mutual information as a certain fidelity mea- 
sure (which may seem slightly artificial) and applied the- 
orem |[ while here we directly exploit the "bottleneck" 
nature of our main result (compare again fig. ^), thus 
providing a much more natural approach. 



C. Fixed source ensemble and classical case 

Our approach has concentrated on universal proper- 
ties of the POVM, leaving the source as free as possi- 
ble. What happens if we fix the source {pi,pi}? Note 
firstly that the whole situation is fully classical now, as 
we only have to regard the correlation between source 
issues X — i and measurement results Y = j. 

Thus it is modelled by the classical case of the initial 
problem: the source is and the POVM b con- 

sists of operators 

bj =^2'Tr(p i a j )\i){i\. 

i 

This model has the same joint statistics of i and j as 
the above described one (most generally, bj can be any 
operator with eigenbasis 

Now observe the following: as long as the POVMs 
are diagonal in the basis too (this is the classicality 

condition for the POVMs), the validity of (CP) for all 
ensembles with average 



is implied by its validity for the ensemble {\i}(i\,pi}. This 
is because source states py and J2i l*X*l/°i'l*X*l produce 
the same statistics, so only sources consisting of mixtures 
of the |i)(z| have to be considered. The condition (CP) 
for them clearly is implied by its validity for 

At this point theorems || and [sj can be applied: because 
the induced ensemble for source state P and POVM b is 
{<Tj, Xj}, with 

Xj = £p>Tr (piCLj) = Tr (paj), 

i 

&j =^2—p i Tr(p i a j )\i){i\, 
i A J 

we obtain I{X A Y), that is the Shannon mutual infor- 
mation between the source and the measurement, as the 
rate of intrinsic data. More precisely, we can perform a 
data separation by postprocessing, according to the pre- 
scription of the beginning of this section, eq. (27), into 
extrinsic v, almost inpendent of i , and intrinsic j 1 de- 
pending on i l and v. 

However, this is not exactly what we set out to initially: 
theorem |^ allows us to decompose the bji into convex 
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combinations of operators 



such that the latter's distribution is recovered as a con- 
volution; in terms of stochastic maps q is factorized into 
q and r: 



but it is not clear that these can be obtained from 
POVMs AW, in the sense that 



ViA/j'W = Ti- 



ft' 



A 



O) 



For this to hold the vectors 



(for all j ) must 



belong to the cone spanned by the vectors {(ip\pii \tp))i' ■ 
It is conceiveable that under this condition the obtain- 
able intrinsic data rate increases. We have to leave this 
interesting question for the moment. 

For classical sources and measurements we thus ob- 
tain that intrinsic data equals mutual information. On 
the other hand, we can come back to their being distinct 
in t ruly q uantum situations: we pointed out in subsec- 
tion VI A that the maximum of I(X A Y) over all sources 
with average p gives the accessible information I a cc(A; p) 
of the ensemble {pj,Xj}, which in general is less than 
I(X;p). The difference can be accounted for by consid- 
ering that the sources in this maximization are of the 
special i.i.d. type (on Z -blocks), while (CM) implies (CP) 
even for sources of entangled states, as long as their av- 
erage is ijj — p® 1 . This should be viewed espe cially in the 
light of the conjecture implied in subsection VI F. 



D. Sufficient statistics 

The reader familiar with classical statistical theories 
may have been reminded by our above discussion of the 
concept of sufficient statistics, at least when the quan- 
tum source and the observation are essentially classical, 
i.e. when all the pi and aj commute: the former are then 
just probability distributions and the latter form a sta- 
tistical decision rule, with distribution of j conditional 
on i denoted q(j\i). As there is also a distribution pi on 
the i we have here a statistical model in the sense of es- 
timation theory (we refer the reader to jl8| for detailed 
explanations) . 

We will consider the values of i and j as random vari- 
ables: then a sufficient statistics is a random variable fc 
which is a function of j (whose distribution conditional 
on i we denote q(k\i)), such that the distribution of j 
conditional on k is independent of i: 



Pr{j\k}=Pr{j\k,i} 



Let us denote these conditional probabilities by r(j\k). 

This implies that we can simulate the distribution of j 
conditional on i from fc: 



k 



r(j\k)q(k\i). 



In words, to each entry fc of the new data record there 
exists a distribution on the j of the original data record 



% > k > j. 

On the other hand, our theorem |^ provides something 
appearing to be dual to this (apart from holding only ap- 
proximately and in an asymptotic setting; these things 
are easily introduced in sufficient statistics, too): a ran- 
dom varible v with distribution x, independent of i and 
j, and conditional on it a stochastic map a v {j\i) such 
that 



= ^x v a v {j\i). 



In a diagram: 



v 
i 



i > p >j. 



Like fc in the case of sufficient statistics, the pair pv is 
a function of j, but unlike there, where q and r were 
stochastic maps with independent sources of randomness 
(when stochastic maps are viewed as set function valued 
random variables, this is expressed by the independence 
of q and r), the maps Q and R draw their randomness 
from the same source v. 

In summary, there is no direct isomorphism between 
our concept of data reduction and sufficient statistics 
(which, too, can be used to reduce the entropy of data 
sets) : the latter appears as a special case where the maps 
Q and R are independent. 



E. Entropy exchange 

We want to discuss an application of theorem |l^ to the 
entropy exchange of quantum operations, introduced by 
Schumacher [^2) (and previously by Lindblad ^0|): for a 
quantum operation in the form (|2l]) it is defined as 

S e (p; p.) = H(W), with W jk = Tr (V jP V^). 

It can be shown to be independent of the Kraus represen- 
tation, by identifying it with the entropy increase in an 
initially pure environment of the system by a Stincspring 
dilation of ip*, see |2^]. In the latter work a number of 
interesting relations between S e and other entropic quan- 
tities are shown. 

In particular, returning to the notation of section 
it is shown that there is a (in this sense, minimal) Kraus 
representation of </j* such that if (A) = S e (p; <p). Because 
of /(A; p) < H(X) (this is simply data processing inequal- 
ity M), we conclude 



£(/>; (p*) < S e (p; (p*). 
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By the derivation this quantity may be dubbed gen- 
uinely quantum entropy exchange of a channel, as it is 
that part of the noise that cannot be accounted for clas- 
sically. 

From a different point of view, in fact also the maxi- 
mum of I( A; p) over all Kraus representations of ip* (com- 
pare eq. (pq)) is interesting: in a cryptographic setting, 
where tp t connects users A and B, and is controlled by 
an eavesdropper E, it is the amount of data collected by 
E about A's messages in the worst case. 

A deeper investigation of these concepts is relegated to 
another occasion 



F. An open problem 



the possible relation between the present approach and 
sufficient statistics. Thanks to Peter Shor who supplied 
the insight that the difference between data and informa- 
tion disappears in the presence of entanglement. I thank 



Masanao Ozawa for pointing out to me that theorem 10 
initially only formulated for operations, is in fact valid 
for instruments. Part of this work was done during my 
stay at the ERATO project "Quantum Computation and 
Information", Tokyo (August/September 2001). I thank 
the members of the project for their hospitality, and espe- 
cially Keiji Matsumoto for discussions on the content of 
the appendix, on which I also enjoyed conversation with 
Richard Jozsa and Masahide Sasaki. Last but not least, 
special thanks are due to Marco P. Carota for constant 
encouragement during the course of this work. 



An interesting and challenging question is about the 
amount of data collected by a under the hypothesis of 
an arbitrarily varying source (AVS), instead of the i.i.d. 
model considered here: 

An AVS is a collection of source ensembles {pi S ,Pi S } 
(with average state p s ), labelled by s £ 5, which we 
make into a discrete memoryless source by considering 
the ensembles (labelled by s l e S l ) 

{pi'shPi's>}i> ■ 

The idea is that at each position k = 1, . . . , I the source 
may be arbitrarily in one of the internal states s € S. We 
have no — not even statistical information — about s, so 
our data separation must work for all s l E S l : formally 
the condition on A = ^ x„AM is 



^ 2 



\Ju(s l ){a.ji - Aj,)^u;(s l ) 



(AVCM) 

where w(s l ) = p si • ■ • (8> p si is the average state of the 
source when in internal state s l . 

A natural candidate for the minimum data rate of the 
A(") seems to be 

max{/(A; p) : p S conv{p s : s £ 5}}, 

with XjPj = i/pcLjy/p, and conv denoting the closed con- 
vex hull. 

If this is true, then in particular the quantity 

A(a) = max /(A; p) 

p 

is the amount of data collected by a, regardless of any 
source ensemble. 
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APPENDIX A: CANONICAL PURIFICATIONS 

In this appendix we collect a few facts about mixed 
state fidelity and a certain kind of purification of mixed 
states, which we call canonical, that seem not to be 
widely known. These are used in the main text, but 
seem to be of interest in their own right. 

For the state u> on Hi consider a purification \ip) — 
J2i v^il*) ® N) 011 a bipartite system Hi ®Hi, that we 
already have put in Schmidt polar form. Then on both 
systems there exist (R— linear) complex conjugation maps 
with respect to the basis {\i)}: 



E 



a.i\i) 



Then, with \T) = J2i K) ® K)j it can be checked that 

\1>M = (vE®i)|JX J l(Vw®i) 

= (1® V^|/)(/|(l(g) Vw), 
see also the following lemma Il4l Then 



(i. 



%)\<l>M(*-®VSk j 

= q k (l ® U k )[(l ® ® y/rft] (1 ® U* k ) 

= q k {t®U k )\t k ){t k \{t®U* k ) 1 

the third line introducing q k r k = ^/ujS k ^/uj on 7^2, and 
the polar decomposition \fSky/uj = U k ^/q k T k , the fourth 
the canonical purification \t k ) on 7i\ ® Hi of T k (with 
respect to |/)(/|), see lemma |l4] below. By this lemma 
we can infer 



Tr W2 ®S k )] =g fc Tr W2 



t k )(t k 



with the complex conjugated operator r k , which is de- 
fined as 



Tk 



\<f>i){<f>i\, if T fc 
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Note that this is uniquely defined, regardless of the con- 
vex decomposition chosen, and in particular independent 
of the phases of the \(j>i). 

The ensemble {iT, ft} has average ZJ — to, and con- 
versely, the above formulas show how to induce any en- 
semble {<Tk,qk} for lu on Hi: let Sk — LJ^ 1 /' 2 qkO r kijJ~ 1 ^' 2 
(this was noted before in Jl6| in the context of classifying 
ensembles with a given density operator). 

Lemma 14 ("Pretty good purifications") 

Consider orthonormal bases of spaces TL\ and Ti.2, 
both denoted {\i)}, and introduce \I) = \i) <S> \i). As 
before, we denote the complex conjugation with respect 
to this basis by ~. Then for a state p = ai\tpi)(ipi\ (in 
diagonalized form), 



which we wanted to show. 



□ 



\r)(r\ = (^p®l)\I){I\(^p®l) 
\r) =^y/al\il>i)® Wh), 



with 



is a purification of p. We call it the canonical purification 
with respect to \I) . (Note that this definition makes sense 
as it is independent of phases in the 

If \s)(s\ is the canonical purification of another state a 
then for the fidelity between these: 

F(\r),\s)) = \(r\s)\ 2 = (Tr^-p^ 2 
Furthermore 



(Al) 



|rXr|-|*X»l||i< ¥W- 



(A2) 
(A3) 



Proof . The formula for the canonical purification is a 
straightforward calculation. With its help, it is also 



straightforward to check the fidelity identity, eq. (Al) 
Now for the last two estimates: begin with 



1 - Tr V? V° = Tr 



p-VP)) 



< 



< 



< 



\Vp\\ 2 \\Vp- v^H 



= V\\p-o-\\i, 

invoking two nontrivial inequalities: in the third line 
we use Cor. IV.2.6 of f§ (which is a kind of Holder 
or Cauchy-Schwarz inequality), in the fourth line 
Thm. X.1.3 from the same book. 
Finally, use the well known identity 

\\\\r)(r\-\8)(s\\\ 1 = \/l-n\r),\s)) 

to obtain 
1 
2 



7;\\\r)(r\-\s)(s\l = V±^W\sW 



< 



2Vl-KrWI 



< V2 



Remark 15 Observe Tr y/p^/a < \\->/p\/o~\\i, the square 
of this latter quantity being known as the (mixed state) fi- 
delity fiP~\l- By theorems by Uhlmann fl^/ and Jozsa J7^/ 
the mixed state fidelity F(p,a) — Hy/pv^lli equals the 
maximum over the pure state fidelities of all possible 
purifications of p and a. Because of well known rela- 
tions between mixed state fidelity and trace norm distance 
(see ^j), more precisely 



1-Vn^)< ^\\p-^\\i < y/l -F(p,a), (A4) 

the lemma tells us that at least for ( mixed state ) fidelity 
close to 1 the canonical purifications are not too far off 
the optimum with respect to (pure state) fidelity. 



APPENDIX B: TYPICAL SEQUENCES AND 
SUBSPACES 



For a probability distribution P on the finite set X 
define set of typical sequences (with 5 > 0) 



I P,5 — 



{x l : Wx \N(x\x l ) - IP X \ < 6yfiy/P x (l-P x )} , 



where N(x\x l ) counts the number of occurences of x in 
the word x l — x\ . . . x n . 

For a state p fix eigenstates ei, . . . , e<j (with eigenvalues 
R\ , . . . , Rd) and define for S > the typical projector as 



rr 



P,S 



e *i 



>e tl . 



For a collection of states pj, j = 1, . . . , m, and j E [rn] 1 
define the conditional typical projector as 

n Ui') = (g) n ^ 



where Ij — {k : jk = j} and II. 3 $ is meant to denote 
the typical projector of the state pj on the subsystem 
composed of the tensor fatcors Ij in the tensor product 
of I factors. From |27j we cite the following properties of 
these projectors: 



Tr(p^n^)>l 



Tr(^,n^(j'))>l 
TrOSyll^) > 1- 



d 

md 



S 2 



TrlJ l pS < exp (lH(p) + KdSVl) , 



(Bl) 
(B2) 
(B3) 

(B4) 



TrIP M > [1- 



6-2 



exp (lH(p) - KdSVtj , (B5) 



1G 



Trn|j s { 3 1 ) < exp (lH(p\P jl ) + KmdSVl) , (B6) 

Trn^(j') > (l - exp (lH(p\P f ) + KmdSVl 

(B7) 

for an absolute constant K > 0, and the empirical distri- 
bution Pji of letters j in the word j : 



1 



= exp (-IHiplPji) + KmdSVl 
0' = exp (-IHiplPj.) - KmddVtj 



we have 



(B8) 



Finally, with 



a = exp (-lH(p) - KddVl 
a' = exp (-lH(p) + K dSVl 



(B9) 
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